Validity and the social dimension of language testing-Sareh Saatian
Validity and the social dimension of language testing In what ways is the social dimension of language assessment reflected in current theories of the validation of language tests? Here we will consider the theories of validation that have most influenced current thinking in our field, in the work of Messick (following Cronbach) and his successors Mislevy and Kane, and its interpretation within language testing by Bachman, Chapelle, Lynch, Kunnan, Shohamy, and others. Contemporary validity theory has developed procedures for supporting the rationality of decisions based on tests and has thus addressed issues of test fairness. However, although validity theory has also begun again to develop ways of thinking about the social dimensions of the use of tests, many issues are still unresolved, and in fact, it almost feels as if the ongoing effort to incorporate the social in this latter sense goes against the grain of much validity theory, which remains still heavily marked by its origins in the individualist and cognitively oriented field of psychology. Construct validity First: it must be defined in such a way that it becomes measurable. Second: any construct should be defined in such a way that it can have relationship with other construct that are different Cronbach Current discussions of validity in education assessment are influenced by Cronbach who is called the father of construct validity. Cronbach and Meehl changed the name of construct validity to criterion-related validity and they mentioned the high central role of construct validity that subsumed other types of validity. They mentioned the strength and weaknesses of this validity.. Messick He performed a social dimension of testing clearly within his model. Messick like Cronbach, saw assessment as a process of reasoning and information gathering carried out based of individual inferences and saw the task of establishing the meaningfulness of these inferences as being the task of testing developed and research. He showed the social more concrete by discussing two things: Ø Our understanding of what we measure Ø Things that we prefer to measure will reflect values to be social and cultural, and that tests have real effects in the educational and social contexts in which they are used. Messick put these aspects in the form of a matrix. A test is a procedure for gathering evidence to make decisions about individuals so the evidence should be collected carefully and must be some relations between target setting, test and construct. Mislevy The work of Mislevy and his colleagues provides analyticclarity to the procedures involved in designing tests. According to Mislevy: An assessment is a machine for reasoning about what studentsknow, can do, or have accomplished, based on a handfulof things they say, do, or make in particular settings.(Mislevy, Steinberg, & Almond, 2003, p. 4) Mislevy does not consider the context inwhich tests are commissioned and, thus, cannot problematizethe determination of test constructs as a function of their role inthe social and policy environment Kane’s Approach to Test Score Validation Kane has also developed a systematic approach to thinking through the process of drawing valid inferences from test scores.Kane points out that we interpret scores as having meaning. The same score might have different interpretations Kane pointed out that generalization across tasks is often poor in complex performance assessments: that because a person can handle a complex writing task involving one topic and supporting stimulus material it does not necessarily mean that the person will perform in a comparable way on another topic and another set of materials. If task generalizability is weak, or if the impact of raters or the rating process is large, then this “bridge” collapses and we cannot move on in the interpretative argument. Kane thus distinguishes two types of inference (semantic inferences ''and ''policy inferences) and two related types of interpretation. Interpretations that only involve semantic inferences are called descriptive interpretations; interpretations involving policy inferences are called decision-based interpretations The Social Dimension of Validity in Language Testing First Hymes (1967, 1972) talked about socially oriented assessment, but the most elaborated and influential discussion of the validity of communicative language tests, Bachman’s landmark Fundamental Considerations in Language Testing ''(1990), builds its discussion in significant part around Messick’s approach to validity. Most memorably, Bachman (1990) introduced the Bachman model of communicative language ability, reworking and clarifying the earlier interpretations by Canale and Swain (1980) and Canale (1983) of the relevance of the work of Hymes (1972) on first languages to competence in a second language. In communicative language testing, the target of test inferences is performance of a set of communicative tasks in various contexts of use. Bachman does this in two stages. First, he assumes that all contexts have in common that they make specific demands on aspects of test-taker competence. The social context of the target language use situation depends on language user, there is a close relation between context and ability. The target language use situation is conceptualized in terms of components of communicative language ability, which, in turn, is understood as the ability to handle the target language use situation. The situation or context is projected onto the learner as a demand for a relevant set of cognitive abilities; in turn, these cognitive abilities are read onto the context. The theory of social context which is concerned with cognitive demand is absent here. Kane’s integration has certain advantages, in that it sees test use and test consequences as aspects of validity, although the distinction between semantic inferences and policy inferences upholds a separation that survives even into Messick’s matrix, where it reflects, as we have seen, the development of validity theory from its relatively asocial psychometric origins. In the language testing literature, there has been considerable research on test consequences under the headings of ''test wash back, usually referring to the effect on teaching in subject areas that will be examined, and test impact, referring to broader sorts of impacts in the school andbeyond. The Broader Social Context: The Limits of Validity First Kunnan then Shohamy were concerned with broader social and policy issues. What is noticeable here is that the context in which language tests are used is presented quite unproblematically; the responsibility of the tester is to make the “beneficial” information in the test as “useful” as possible. How can tests persist in being so powerful, so influential, so domineering and play such enormous roles in our society? One answer to this question is that tests have become symbols of power for both individuals and society. Bachman is right to suggest that it is not practical to attempt to incorporate the perspective of test takers as “political subjects in a political context” in an interpretative argument. If language testing research is to be more than a technical field, it must at the very least as a counterpoint to its practical activity develop an ongoing critique of itself as a site for the articulation and perpetuation of social relations. In this way, language testing research has a chance to move beyond the limits of validity theory and make a proper contribution to the wider discussion of the general and specific functions of tests in contemporary society